Rank | Count | Beginning |
---|---|---|
6060 | 4381 | De |
4332 | 1714 | Dat |
15128 | 1364 | He |
17354 | 1275 | In |
11406 | 714 | Dor |
24479 | 639 | Se |
2090 | 556 | As |
3082 | 417 | Bi |
8350 | 383 | Demografie |
16714 | 377 | Historie |
25185 | 349 | Sien |
28454 | 346 | Vun |
21509 | 343 | Mit |
13574 | 329 | En |
22977 | 307 | Ok |
20859 | 276 | Man |
21957 | 258 | Na |
18590 | 256 | In’n |
25745 | 217 | So |
11120 | 197 | Disse |
22240 | 181 | Nah |
26490 | 180 | To |
29006 | 178 | Wat |
1613 | 177 | An’n |
14227 | 176 | För |
29345 | 159 | Wenn |
20443 | 155 | Leven |
11429 | 154 | Dorbi |
1350 | 153 | An |
28153 | 149 | Von |
In the next four subsections show the most frequent sentence beginnings consisting of N words, N=1, 2, 3, 4. In this subsection we start with N=1.
The most frequent word-N-grams at the beginning of sentences give some insight into sentence composition.
Especially for N=1, we only need a small corpus to identify the most frequent sentence beginnings.
select substring_index(sentence, ' ', 1) as beg, count(*) as cnt from sentences group by substring_index(sentence, ' ', 1) order by cnt desc limit 50;
4.3.1.2 Most Frequent Sentence Beginnings II
4.3.1.3 Most Frequent Sentence Beginnings III
4.3.1.4 Most Frequent Sentence Beginnings IV
4.3.1.1 Most Frequent Sentence Endings I
4.3.1.2 Most Frequent Sentence Endings II
4.3.1.3 Most Frequent Sentence Endings III
4.3.1.4 Most Frequent Sentence Endings IV